bayesynergy: flexible Bayesian modelling of synergistic interaction effects in in-vitro drug combination experiments

Leiv Rønneberg

7/4/2021


Introduction

The bayesynergy R package implements a Bayesian semi-parametric regression model for estimating the dose-response function of in-vitro drug combination experiments. The Bayesian framework offers full uncertainty quantification of the dose response function and any derived summary statistics, as well as natural handling of replicates and missing data. The Bayesian model is implemented in Stan (Stan Development Team (2020)), taking advantage of the efficient ‘No U-Turn Sampler’ as well as variational Bayes for quick approximations of the true posterior.

The package is further equipped with plotting functions for summarizing a drug response experiment, parallel processing for large drug combination screen, as well as plotting tools for summarizing and comparing these.

The model

The dose-response function \(f:\boldsymbol{x} \to (0,1)\), maps drug concentrations \(\boldsymbol{x}\) to a measure of cell viability – zero corresponding to all cells being dead after treatment, one corresponding to all cells still alive. In drug-combination screens, it is common to assume that the dose-response function can be broken down as \[ f(\boldsymbol{x}) = p_0(\boldsymbol{x})+\Delta(\boldsymbol{x}), \] where \(p_0(\boldsymbol{x})\) encodes a non-interaction assumption, and \(\Delta(\boldsymbol{x})\) captures the residual interaction effect.

Non-interaction

The non-interaction assumption, \(p_0(\boldsymbol{x})\), captures what can be reasonably assumed about a joint drug effect, given estimates of the drugs’ individual effect. We assume a Bliss style independence assumption, where we first assume that the individual drugs’ dose-response function takes the form of a log-logistic curve \[ h_i(x_i|l,s,m) = l + \frac{1-l}{1+10^{s(x_i-m)}}, \] where \(l\) is the lower-asymptote, \(s\) the slope, and \(m\) the drugs ‘EC-50’ on the \(\log_{10}\) scale. The Bliss assumption then amounts to a probabilistic independence assumption, where \[ p_0(\boldsymbol{x}) = h_1(x_1|l_1,s_1,m_1) \ h_2(x_2|l_2,s_2,m_2). \] We call it probabilistic, because we can interpret the individual dose-response curves, \(h_i()\) as probability of cell survival. Defining the events \[ \begin{align} A_i & = \text{A cell survives drug A at concentration $x_{1i}$} \\ B_j & = \text{A cell survives drug B at concentration $x_{2j}$} \\ C_{ij} & = \text{A cell survives both drugs at concentration $\boldsymbol{x}=(x_{1i},x_{2j})$}, \end{align} \] the corresponding probabilities become \[ p_0(\boldsymbol{x}) = P(C_{ij}) = P(A_i)P(B_i) = h_1(x_1|l_1,s_1,m_1) \ h_2(x_2|l_2,s_2,m_2). \]

Interaction

The interaction component, \(\Delta(\boldsymbol{x})\), captures any joint effect of the drugs that is not captured by the non-interaction assumption. If two drugs are more effective together than it would be expected by \(p_0\), we call it synergy, which corresponds to \(\Delta <0\). The opposite effect is deemed antagonism.

Because the interaction landscape can be complex, with multiple local peaks and valleys, we model this term non-parametrically using a Gaussian Process prior (GP). To ensure that the resulting dose-response function only takes values in the interval \((0,1)\), we push the GP through a transformation function \(g()\). That is \[ z(\boldsymbol{x}) \sim \mathcal{GP}(0,\kappa(\boldsymbol{x},\boldsymbol{x}')) \\ \Delta(\boldsymbol{x}) = g(z(\boldsymbol{x})), \] where the transformation function looks like \[ g(z(\boldsymbol{x})) = \frac{-p_0(\boldsymbol{x})}{1+\exp\left\{b_1z(\boldsymbol{x})+\log\left[\frac{p_0(\boldsymbol{x})}{1-p_0(\boldsymbol{x})}\right]\right\}} + \frac{1-p_0(\boldsymbol{x})}{1+\exp\left\{-b_2z(\boldsymbol{x})-\log\left[\frac{p_0(\boldsymbol{x})}{1-p_0(\boldsymbol{x})}\right]\right\}}. \] In addition to ensuring the proper bounds for the dose-response function, this transformation has the feature of \(g(0)=0\), which corresponds to an a priori assumption that \[ \mathbb{E}\left[f(\boldsymbol{x}) | p_0(\boldsymbol{x})\right] \approx p_0(\boldsymbol{x}). \] That is, we make our non-interaction assumption into a formal prior expectation on the dose-response function. This achieves two things, (1) a slightly conservative model that needs to be convinced that interaction effects are present, and (2) no built-in bias of interaction in the prior structure.

The covariance function \(\kappa(\boldsymbol{x},\boldsymbol{x}')\) can be given multiple specifications, including a squared exponential, Matérn, and Rational Quadratic covariance functions. By default, we use a Matérn covariance with the \(\nu\) parameter set to 3/2 yielding \[ \kappa(\boldsymbol{x},\boldsymbol{x}') = \sigma_f^2\left(1+\frac{\sqrt{3}\Vert\boldsymbol{x}-\boldsymbol{x}'\Vert}{\ell}\right)\exp\left\{-\frac{\sqrt{3}\Vert\boldsymbol{x}-\boldsymbol{x}'\Vert}{\ell}\right\}. \] Finally, by utilizing the natural grid structure of the drug concentrations, we can write the kernel function as \[ \kappa(\boldsymbol{x},\boldsymbol{x}') = \sigma_f^2 \kappa(x_1,x_1')\kappa(x_2,x_2'), \] which induces a Kronecker product structure on the final covariance matrix. Following the implementation detailed in Flaxman et al. (2015), this greatly improves the computational efficiency of the model.

The observation model

Given the above formulation for the dose-response function \(f\), we assume that we have access to noisy observations from it. These observations are typically generated from various cellular assays, e.g. viability assays. In particular we assume that given concentration points \(\boldsymbol{x}_1,\ldots,\boldsymbol{x}_n\) we have observations \(y_1,\ldots,y_n\) where \[ y_i = f(\boldsymbol{x}_i) + \epsilon_i, \] where we assume that the errors \(\epsilon_i\) are normally distributed with mean zero. For the variance of the observational errors, by default we model these in a heteroscedastic fashion as \[ \text{Var}\left[\epsilon_i\right] = \sigma^2(f(\boldsymbol{x}_i)+\lambda), \] where \(\lambda\) is set to a small value to handle the case when \(f = 0\), but there is still some residual noise. In a typical setup where cell viability is calculated through a normalization to positive and negative controls, lambda can be empirically set as \[ \lambda = \frac{\sigma^2_{+}}{\sigma^2_{-}}, \] where \(\sigma^2_{+}\) and \(\sigma^2_{-}\) denotes the variance of positive and negative controls, respectively.

We choose a heteroscedastic model by default, because in cell viability assays, the observations are normalized in relation to positive and negative controls. The positive controls typically have much lower variance compared to the negative controls, which translates to viability measures closer to zero being more precisely measured. We also allow homoscedastic noise as an option.

Including controls

The positive and negative controls essentially control the signal-to-noise ratio in cell viability assays. If the user has access to these, they can be included in the model to help calibrate the posterior distribution – particularly in the case with zero replicates.

Let \(\xi^-_k\) and \(\xi^+_l\) denote the negative and positive controls for \(k=1,\ldots,n_-\) and \(l=1,\ldots,n_+\). These measurements are raw readings from the plate and are used to calculate cell viability. For an additional well, treated with drug concentration \(\mathbf{x}_i\), we denote the raw output by \(\xi_i\), and calculate cell viability for this well by the formula: \[ y_i = \frac{\xi_i-\tilde{\xi^+}}{\tilde{\xi^-}-\tilde{\xi^+}}, \] where \(\tilde{\xi^-}\) and \(\tilde{\xi^+}\) denotes some measure of centrality of the positive and negative controls, typically the mean or median.

The controls can themselves be passed through this function and converted to % viability. From the variances of these normalized controls, \(\lambda\) can be set as indicated above. And the negative controls can be added directly into the algorithm. Negative controls represents unhindered cell growth, and can be thought of as samples from the dose-response function \(f(\mathbf{x})\) at concentration \(\mathbf{x}=(0,0)\). These can then be added directly to the \(\texttt{bayesynergy}\) function in the same way as regular observations.

Full model specification

The full model specification, with all default prior distributions look like \[ y_i \sim \mathcal{N}\left(f(\boldsymbol{x}_i),\sigma^2(f(\boldsymbol{x}_i)+\lambda)\right), \ i = 1,\ldots, n \\ \sigma \sim \text{Inv-Ga}\left(5,1\right), \ \lambda = 0.005. \\ f(\boldsymbol{x}_i) = p_0(\boldsymbol{x}_i)+\Delta(\boldsymbol{x}_i) \mathbb{I}(10^{\boldsymbol{x}_i}>0) \\ p_0(\boldsymbol{x}) = h_1(x_1|l_1,s_1,m_1) \ h_2(x_2|l_2,s_2,m_2). \\ l_j = \text{Beta}(1,1.25), \ s_i \sim \text{Gamma}(1,1), \\ m_i \sim \mathcal{N}(\theta_i,\sigma_{m_i}^2), \ j = 1,2 \\ \theta_i \sim \mathcal{N}(0,1), \ \sigma_{m_i}^2 \sim \text{Inv-Ga}\left(3,2\right), \ j = 1,2 \\ \Delta(\boldsymbol{x}) = g(z(\boldsymbol{x})), \ z(\boldsymbol{x}) \sim \mathcal{GP}(0,\kappa(\boldsymbol{x},\boldsymbol{x}')) \\ g(z(\boldsymbol{x})) = \frac{-p_0(\boldsymbol{x})}{1+\exp\left\{b_1z(\boldsymbol{x})+\log\left[\frac{p_0(\boldsymbol{x})}{1-p_0(\boldsymbol{x})}\right]\right\}} + \frac{1-p_0(\boldsymbol{x})}{1+\exp\left\{-b_2z(\boldsymbol{x})-\log\left[\frac{p_0(\boldsymbol{x})}{1-p_0(\boldsymbol{x})}\right]\right\}} \\ \kappa(\boldsymbol{x},\boldsymbol{x}') = \sigma_f^2\left(1+\frac{\sqrt{3}\Vert\boldsymbol{x}-\boldsymbol{x}'\Vert}{\ell}\right)\exp\left\{-\frac{\sqrt{3}\Vert\boldsymbol{x}-\boldsymbol{x}'\Vert}{\ell}\right\}, \\ \sigma_f^2 \sim \text{log-}\mathcal{N}(1,1), \ \ell \sim \text{Inv-Ga}(5,5) \\ b_j \sim \mathcal{N}(1,0.1^2), \ j = 1,2. \] Note that some of these specifications can be altered. For example, by default we estimate the lower asymptotes, but they can also be fixed equal to zero.

In the model specification above, the interaction term is multiplied with an indicator function \(\mathbb{I}(\boldsymbol{x}>0)\) taking the value 1 if and only if all elements in \(\boldsymbol{x}\) is strictly larger than zero. This makes sure that we don’t allow for interaction when one of the drugs is at zero concentration.

Summary measures

From the posterior dose-response function \(f | \mathbf{y}\), we derive a number of summary statistics concerning efficacy, synergy and antagonism.

Monotherapy summaries

For the monotherapy curves, we produce estimates of the drug sensitivity score (DSS) of each drug by the integral

\[ DSS_0 = \int_a^b 1-h_j(x) \text{d}x, \] where \(a=h_j^{-1}(1-\epsilon/2)\) and \(b=\max(x_{1j})\). That is, the integral is taken from where \(h_j\) crosses an ‘activation threshold’ \(1-\epsilon/2\), to the maximum value measured of the drug concentration in the screen. The activation treshold \(\epsilon\) is set to 5%. This value is further standardized by the total volume available for drug efficacy, \[ DSS = \frac{DSS_0}{(1-\epsilon/2)(b-a)} \] From here, values can be further standardized as in Yadav et al. (2014)

Combination summaries

To summarise the total drug-response function, we utilise the measures developed in Cremaschi et al. (2019). As with the DSS scores, we calculate the area above the response function, and denote this the ‘residual volume under the surface’ or rVUS – as it’s the volume left over from integrating the surface.

In general, the integral looks like

\[ rVUS_0(f) = \int_a^b \int_c^d 1-f(\mathbf{x}) \ \text{d}\mathbf{x} \] where the integrals are taken over the observed drug range, i.e. \(a = \min (x_1)\), \(b = \max (x_1)\), \(c = \min (x_2)\), \(d = \max (x_2)\).

From here, we further standardise the rVUS scores to the total volume available for efficacy

\[ rVUS(f) = \frac{rVUS_0(f)}{(b-a)(d-c)}. \] The model calculates \(rVUS\) for the dose-response function \(f\), giving a measure of combined efficacy. In addition, we calculate \(rVUS(p_0)\), the non-interaction efficacy; \(rVUS(\Delta)\) the interaction efficacy. For the interaction term \(\Delta\), we also compute \(rVUS(\Delta^{-})\) and \(rVUS(\Delta^{+})\) for synergy and antagonism, where \(\Delta^{+}\) and \(\Delta^{-}\) denotes the positive and negative parts of \(\Delta\), respectively. That is,

\[ \Delta^{+}(\mathbf{x}) = \max(0,\Delta(\mathbf{x})) \\ \Delta^{-}(\mathbf{x}) = \max(0,-\Delta(\mathbf{x})). \]

Summarising large screens

When running screens with a large amount of drug combinations, it is helpful to have a normalised measure for comparing synergy across experiments. The \(rVUS\) scores defined above are already standardized to their drug concentration range, but to compare across experiments, we also standardize with respect to the uncertainty in the model. To do this, we calculate a synergy score by normalizing \(rVUS(\Delta^{-})\) with respect to its standard deviation. \[ \text{Synergy score} = \frac{\text{mean}(rVUS(\Delta^{-}))}{\text{sd}(rVUS(\Delta^{-}))}. \]

A simple example – a single experiment

In the R package, we’ve attached two example datasets from a large drug combination screening experiment on diffuse large B-cell lymphoma. We’ll use these to show some simple use cases of the main functions and how to interpret the results.

Let’s load in the first example and have a look at it

library(bayesynergy)
data("mathews_DLBCL")
y = mathews_DLBCL[[1]][[1]]
x = mathews_DLBCL[[1]][[2]]
head(cbind(y,x))
##      Viability ibrutinib ispinesib
## [1,] 1.2295618    0.0000         0
## [2,] 1.0376006    0.1954         0
## [3,] 1.1813851    0.7812         0
## [4,] 0.5882688    3.1250         0
## [5,] 0.4666700   12.5000         0
## [6,] 0.2869514   50.0000         0

We see that the the measured viability scores are stored in the vector y, while x is a matrix with two columns giving the corresponding concentrations where the viability scores were read off.

Fitting the regression model is simple enough, and can be done on default settings simply by running the following code (where we add the names of the drugs involved, as well as the concentration units for plotting purposes).

fit = bayesynergy(y,x, drug_names = c("ibrutinib", "ispinesib"),
                  units = c("nM","nM"))
## 
## SAMPLING FOR MODEL 'gp_grid' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.0002 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 2.48102 seconds (Warm-up)
## Chain 1:                2.59128 seconds (Sampling)
## Chain 1:                5.0723 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'gp_grid' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 6.6e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.66 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 2.32784 seconds (Warm-up)
## Chain 2:                1.47787 seconds (Sampling)
## Chain 2:                3.80571 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'gp_grid' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 6e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.6 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 2.20523 seconds (Warm-up)
## Chain 3:                2.42381 seconds (Sampling)
## Chain 3:                4.62903 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'gp_grid' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 6.3e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.63 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 2.49997 seconds (Warm-up)
## Chain 4:                5.02691 seconds (Sampling)
## Chain 4:                7.52688 seconds (Total)
## Chain 4:

The resulting model can be summarised by running

summary(fit)
##                mean  se_mean     sd      2.5%       50%  97.5% n_eff  Rhat
## la_1[1]       0.330 0.003036 0.0821  1.25e-01  3.42e-01  0.460   731 1.004
## la_2[1]       0.386 0.002718 0.0617  1.74e-01  3.96e-01  0.460   516 1.010
## log10_ec50_1  0.484 0.005910 0.1704  2.33e-01  4.49e-01  0.931   831 1.002
## log10_ec50_2 -1.024 0.019946 1.0066 -3.38e+00 -8.92e-01  0.488  2547 1.000
## slope_1       1.986 0.019985 0.9413  8.01e-01  1.76e+00  4.303  2218 1.003
## slope_2       1.427 0.018249 1.0459  1.14e-01  1.18e+00  4.153  3285 1.002
## ell           3.045 0.032709 1.5341  1.21e+00  2.70e+00  6.870  2200 1.000
## sigma_f       0.857 0.016475 0.8574  1.72e-01  6.17e-01  2.938  2708 1.000
## s             0.105 0.000276 0.0153  8.00e-02  1.03e-01  0.140  3057 1.002
## dss_1        37.873 0.139525 5.7369  2.67e+01  3.80e+01 49.024  1691 1.002
## dss_2        43.914 0.272526 7.8026  2.33e+01  4.53e+01 55.594   820 1.008
## rVUS_f       82.802 0.015023 0.9360  8.09e+01  8.28e+01 84.660  3882 1.000
## rVUS_p0      73.096 0.033670 2.3682  6.82e+01  7.32e+01 77.354  4947 1.000
## rVUS_Delta   -9.706 0.037315 2.5462 -1.50e+01 -9.59e+00 -4.980  4656 1.000
## rVUS_syn      9.760 0.036700 2.4970  5.24e+00  9.63e+00 15.016  4629 1.000
## rVUS_ant      0.054 0.002158 0.1267  5.06e-06  8.46e-05  0.439  3448 0.999
## 
## log-Pseudo Marginal Likelihood (LPML) =  52.58371

which gives posterior summaries of the parameters of the model. In addition, the model calculates summary statistics of the monotherapy curves and the dose-response surface including drug sensitivity scores (DSS) for the two drugs in question, as well as the residual volumes under the surface (rVUS) that capture the notion of efficacy (rVUS_f), interaction (rVUS_Delta), synergy (rVUS_syn) and interaction (rVUS_ant).

As indicated, the total combined drug efficacy is around 80% (rVUS_f), of which around 70 percentage points can be attributed to \(p_0\) (rVUS_p0), leaving room for 10 percentage points worth of synergy (rVUS_syn). We can also note that the model is fairly certain of this effect, with a 95% credible interval given as (5.237, 15.016).

We can also create plots by simply running

plot(fit, plot3D = F)

which produces monotherapy curves, monotherapy summary statistics, 2D contour plots of the dose-response function \(f\), the non-interaction assumption \(p_0\) and the interaction \(\Delta\). The last plot displays the \(rVUS\) scores as discussed previously, with corresponding uncertainty.

The package can also generate 3D interactive plots by setting plot3D = T. These are displayed as following using the plotly library (Plotly Technologies Inc. (2015)).

A simple example – a synergy screen

The synergyscreen provides a work flow for data from big drug combination screens, where multiple drugs are tested in combination on multiple cell lines. It takes as input a list of experiments, each entry being a list containing the necessary elements needed for a call to the main regression function bayesynergy.

Included in the package is the result of a synergyscreen run of 583 drug combinations on the A-375 human melanoma cell line from ONeil et al. (2016). The synergyscreen object is a list with two entries, a dataframe with parameter estimates from each experiment, and a list entitled failed – containing experiments that either failed completely to process, or had an unsatisfactory fit.

data("ONeil_A375")
length(ONeil_A375$failed)
## [1] 2

We see that the dataset has two experiments that failed to process, during an initial run of synergyscreen. There’s a multitude of reasons why an experiment might fail to process, it could be an input error, initialization problems or problems with the parallel processing.

The entries of failed are themselves lists, each containing the necessary information to process through the bayesynergy function

failed_experiment = ONeil_A375$failed[[1]]
names(failed_experiment)
## [1] "y"             "x"             "drug_names"    "experiment_ID"
head(cbind(failed_experiment$y,failed_experiment$x))
##      viability L778123 MK-4827
## [1,]     0.759   0.325   0.223
## [2,]     0.755   0.325   0.775
## [3,]     0.548   0.325   2.750
## [4,]     0.307   0.325  10.000
## [5,]     0.787   0.800   0.223
## [6,]     0.820   0.800   0.775

We can rerun experiments that failed to process, by simply passing the returned synergyscreen object back into the function. Note that we turn of the default options of saving each fit and plotting everything, and set method = "vb" indicating we use variational inference to fit the model.

fit_screen = synergyscreen(ONeil_A375, save_raw = F, save_plots = F, parallel = F, 
                           bayesynergy_params = list(method = "vb"))
## Chain 1: ------------------------------------------------------------
## Chain 1: EXPERIMENTAL ALGORITHM:
## Chain 1:   This procedure has not been thoroughly tested and may be unstable
## Chain 1:   or buggy. The interface is subject to change.
## Chain 1: ------------------------------------------------------------
## Chain 1: 
## Chain 1: 
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000231 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.31 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Begin eta adaptation.
## Chain 1: Iteration:   1 / 250 [  0%]  (Adaptation)
## Chain 1: Iteration:  50 / 250 [ 20%]  (Adaptation)
## Chain 1: Iteration: 100 / 250 [ 40%]  (Adaptation)
## Chain 1: Iteration: 150 / 250 [ 60%]  (Adaptation)
## Chain 1: Iteration: 200 / 250 [ 80%]  (Adaptation)
## Chain 1: Success! Found best value [eta = 1] earlier than expected.
## Chain 1: 
## Chain 1: Begin stochastic gradient ascent.
## Chain 1:   iter             ELBO   delta_ELBO_mean   delta_ELBO_med   notes 
## Chain 1:    100          -55.891             1.000            1.000
## Chain 1:    200          144.374             1.194            1.387
## Chain 1:    300          156.579             0.822            1.000
## Chain 1:    400          149.076             0.629            1.000
## Chain 1:    500          161.738             0.519            0.078
## Chain 1:    600          163.019             0.434            0.078
## Chain 1:    700          158.352             0.376            0.078
## Chain 1:    800          159.448             0.330            0.078
## Chain 1:    900          162.109             0.295            0.050
## Chain 1:   1000          160.243             0.267            0.050
## Chain 1:   1100          161.742             0.168            0.029
## Chain 1:   1200          160.932             0.029            0.016
## Chain 1:   1300          157.760             0.024            0.016
## Chain 1:   1400          162.193             0.021            0.016
## Chain 1:   1500          161.941             0.014            0.012
## Chain 1:   1600          162.219             0.013            0.012
## Chain 1:   1700          163.439             0.011            0.009   MEDIAN ELBO CONVERGED
## Chain 1: 
## Chain 1: Drawing a sample of size 1000 from the approximate posterior... 
## Chain 1: COMPLETED.
## Chain 1: ------------------------------------------------------------
## Chain 1: EXPERIMENTAL ALGORITHM:
## Chain 1:   This procedure has not been thoroughly tested and may be unstable
## Chain 1:   or buggy. The interface is subject to change.
## Chain 1: ------------------------------------------------------------
## Chain 1: 
## Chain 1: 
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000167 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.67 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Begin eta adaptation.
## Chain 1: Iteration:   1 / 250 [  0%]  (Adaptation)
## Chain 1: Iteration:  50 / 250 [ 20%]  (Adaptation)
## Chain 1: Iteration: 100 / 250 [ 40%]  (Adaptation)
## Chain 1: Iteration: 150 / 250 [ 60%]  (Adaptation)
## Chain 1: Iteration: 200 / 250 [ 80%]  (Adaptation)
## Chain 1: Success! Found best value [eta = 1] earlier than expected.
## Chain 1: 
## Chain 1: Begin stochastic gradient ascent.
## Chain 1:   iter             ELBO   delta_ELBO_mean   delta_ELBO_med   notes 
## Chain 1:    100         -296.035             1.000            1.000
## Chain 1:    200           18.381             9.053           17.105
## Chain 1:    300          119.532             6.317            1.000
## Chain 1:    400          144.867             4.782            1.000
## Chain 1:    500          145.816             3.827            0.846
## Chain 1:    600          146.854             3.190            0.846
## Chain 1:    700          148.744             2.736            0.175
## Chain 1:    800          150.175             2.395            0.175
## Chain 1:    900          150.858             2.130            0.013
## Chain 1:   1000          148.377             1.918            0.017
## Chain 1:   1100          148.899             1.819            0.013   MAY BE DIVERGING... INSPECT ELBO
## Chain 1:   1200          151.582             0.110            0.013
## Chain 1:   1300          153.429             0.027            0.012
## Chain 1:   1400          146.902             0.013            0.012
## Chain 1:   1500          152.777             0.017            0.013
## Chain 1:   1600          150.496             0.017            0.015
## Chain 1:   1700          150.406             0.016            0.015
## Chain 1:   1800          149.891             0.016            0.015
## Chain 1:   1900          152.208             0.017            0.015
## Chain 1:   2000          150.270             0.016            0.015
## Chain 1:   2100          150.225             0.016            0.015
## Chain 1:   2200          150.175             0.014            0.013
## Chain 1:   2300          145.867             0.016            0.015
## Chain 1:   2400          149.790             0.014            0.015
## Chain 1:   2500          152.800             0.012            0.015
## Chain 1:   2600          150.619             0.012            0.014
## Chain 1:   2700          150.871             0.012            0.014
## Chain 1:   2800          141.569             0.019            0.015
## Chain 1:   2900          149.191             0.022            0.020
## Chain 1:   3000          150.091             0.022            0.020
## Chain 1:   3100          146.830             0.024            0.022
## Chain 1:   3200          150.337             0.026            0.023
## Chain 1:   3300          152.835             0.025            0.022
## Chain 1:   3400          149.625             0.024            0.021
## Chain 1:   3500          152.647             0.024            0.021
## Chain 1:   3600          150.797             0.024            0.021
## Chain 1:   3700          150.945             0.024            0.021
## Chain 1:   3800          150.273             0.018            0.020
## Chain 1:   3900          144.451             0.017            0.020
## Chain 1:   4000          150.139             0.020            0.021
## Chain 1:   4100          152.244             0.019            0.020
## Chain 1:   4200          150.227             0.018            0.016
## Chain 1:   4300          151.294             0.017            0.014
## Chain 1:   4400          151.840             0.015            0.013
## Chain 1:   4500          152.833             0.014            0.012
## Chain 1:   4600          154.142             0.014            0.008   MEDIAN ELBO CONVERGED
## Chain 1: 
## Chain 1: Drawing a sample of size 1000 from the approximate posterior... 
## Chain 1: COMPLETED.

We can also plot the result of the screen:

plot(fit_screen)

Diagnosing errors and warnings

Sometimes, the bayesynergy function may return with a warning. Ideally, we don’t want any warnings at all, and they should be examined closely, as posterior samples could be unreliable. Usually, the warning will tell the user how to fix the problem at hand, e.g. by running the chains for longer (set iter higher), or setting adapt_delta higher. See [https://mc-stan.org/misc/warnings.html] for some general tips.

Divergent Transitions

Most commonly, the sampler might complain about divergent transitions. The warning will typically look like this:

## Warning: There were 2316 divergent transitions after warmup. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.

This is indicative of a posterior geometry that is tricky to explore. In the case where there is only a few divergent transitions, the usual trick is to set adapt_delta to a higher value, i.e. larger than 0.9 which is the default. This can be done through the control option in the bayesynergy call:

fit = bayesynergy(y, x, control = list(adapt_delta = 0.99))

However, the case above, where there are 2316 divergent transitions, is indicative of a misspecified model. In my experience, this can happen for a few reasons.

References

Cremaschi, Andrea, Arnoldo Frigessi, Kjetil Taskén, and Manuela Zucknick. 2019. “A Bayesian Approach to Study Synergistic Interaction Effects in in-Vitro Drug Combination Experiments.” http://arxiv.org/abs/1904.04901.

Flaxman, Seth, Andrew Gelman, Daniel Neill, Alex Smola, Aki Vehtari, and Andrew Gordon Wilson. 2015. “Fast Hierarchical Gaussian Processes.” Manuscript in Preparation.

ONeil, Jennifer, Yair Benita, Igor Feldman, Melissa Chenard, Brian Roberts, Yaping Liu, Jing Li, et al. 2016. “An Unbiased Oncology Compound Screen to Identify Novel Combination Strategies.” Molecular Cancer Therapeutics 15 (6): 1155–62. https://doi.org/10.1158/1535-7163.mct-15-0843.

Plotly Technologies Inc. 2015. “Collaborative Data Science.” Montreal, QC: Plotly Technologies Inc. 2015. https://plot.ly.

Stan Development Team. 2020. “RStan: The R Interface to Stan.” http://mc-stan.org/.

Yadav, Bhagwan, Tea Pemovska, Agnieszka Szwajda, Evgeny Kulesskiy, Mika Kontro, Riikka Karjalainen, Muntasir Mamun Majumder, et al. 2014. “Quantitative Scoring of Differential Drug Sensitivity for Individually Optimized Anticancer Therapies.” Scientific Reports 4 (1). https://doi.org/10.1038/srep05193.